NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

VeBPF Many-Core Architecture for Network Functions in FPGA-based SmartNICs and IoT

Tahir, Zaid; Sanaullah, Ahmed; Bandara, Sahan; Drepper, Ulrich; Herbordt, Martin (September 2024, IEEE)

Full Text Available
Performance Evaluation of VirtIO Device Drivers for Host-FPGA PCIe Communication

https://doi.org/10.1109/IPDPSW63119.2024.00043

Bandara, Sahan; Sanaullah, Ahmed; Tahir, Zaid; Drepper, Ulrich; Herbordt, Martin (May 2024, IEEE)

Full Text Available
Further Optimizations and Analysis of Smith-Waterman with Vector Extensions

https://doi.org/10.1109/IPDPSW63119.2024.00113

Sajjadinasab, Reza; Rastaghi, Hamed; Shahzad, Hafsah; Arora, Sanjay; Drepper, Ulrich; Herbordt, Martin (May 2024, IEEE)

Full Text Available
Effortless Locality on Data Systems Using Relational Fabric

https://doi.org/10.1109/TKDE.2024.3386827

Papon, Tarikul Islam; Mun, Ju Hyoung; Karatsenidis, Konstantinos; Roozkhosh, Shahin; Hoornaert, Denis; Sanaullah, Ahmed; Drepper, Ulrich; Mancuso, Renato; Athanassoulis, Manos (January 2024, IEEE Transactions on Knowledge and Data Engineering)

A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a significant impact on the entire data system architecture. The multiple-decadelong journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: “What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?” To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore's law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which profoundly impacts the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy, allowing for better hardware utilization and, ultimately, better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.
more » « less
Full Text Available
On-the-Fly Data Transformation in Action

https://doi.org/10.14778/3611540.3611593

Mun, Ju Hyoung; Karatsenidis, Konstantinos; Papon, Tarikul Islam; Roozkhosh, Shahin; Hoornaert, Denis; Drepper, Ulrich; Sanaullah, Ahmed; Mancuso, Renato; Athanassoulis, Manos (August 2023, Proceedings of the VLDB Endowment)

Transactional and analytical database management systems (DBMS) typically employ different data layouts: row-stores for the first and column-stores for the latter. In order to bridge the requirements of the two without maintaining two systems and two (or more) copies of the data, our proposed system Relational Memory employs specialized hardware that transforms the base row table into arbitrary column groups at query execution time. This approach maximizes the cache locality and is easy to use via a simple abstraction that allows transparent on-the-fly data transformation. Here, we demonstrate how to deploy and use Relational Memory via four representative scenarios. The demonstration uses the full-stack implementation of Relational Memory on the Xilinx Zynq UltraScale+ MPSoC platform. Conference participants will interact with Relational Memory deployed in the actual platform.
more » « less
Full Text Available
Enabling VirtIO Driver Support on FPGAs

https://doi.org/10.1109/H2RC56700.2022.00006

Bandara, Sahan; Sanaullah, Ahmed; Tahir, Zaid; Drepper, Ulrich; Herbordt, Martin (November 2022, 8th International Workshop on Heterogeneous High Performance Reconfigurable Computing)

Full Text Available
Relational Fabric: Transparent Data Transformation

https://doi.org/10.1109/ICDE55515.2023.00297

Papon, Tarikul Islam; Hyoung Mun, Ju; Roozkhosh, Shahin; Hoornaert, Denis; Sanaullah, Ahmed; Drepper, Ulrich; Mancuso, Renato; Athanassoulis, Manos (April 2023, IEEE 39th International Conference on Data Engineering (ICDE'23))

A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a profound impact on the entire data system architecture. The multiple-decadelong journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: “What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?” To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore’s law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage component to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which has a profound impact on the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy allowing for better hardware utilization, and ultimately better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware, as well as the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.
more » « less
Full Text Available
Relational Memory: Native In-Memory Accesses on Rows and Columns

Roozkhosh, Shahin; Hoornaert, Denis; Mun, Ju Hyoung; Papon, Tarikul Islam; Sanaullah, Ahmed; Drepper, Ulrich; Mancuso, Renato; Athanassoulis, Manos (March 2023, International Conference on Extending Database Technology (EDBT'23))

Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming rows to columns at runtime is expensive, hence, many analytical systems ingest data in row-first form and transform it in the background to columns to facilitate future analytical queries. How will this design change if we can always efficiently access only the desired set of columns? To address this question, we present a radically new approach to data transformation from rows to columns. We build upon recent advancements in embedded platforms with re-programmable logic to design native in-memory access on rows and columns. Our approach, termed Relational Memory (RM), relies on an FPGA-based accelerator that sits between the CPU and main memory and transparently transforms base data to any group of columns with minimal overhead at runtime. This design allows accessing any group of columns as if it already exists in memory. We implement and deploy RM in real hardware, and we show that we can access the desired columns up to 1.63× faster compared to a row-wise layout, while matching the performance of pure columnar access for low projectivity, and outperforming it by up to 2.23× as projectivity (and tuple reconstruction cost) increases. Overall, RM allows the CPU to access the optimal data layout, radically reducing unnecessary data movement without high data transformation costs, thus, simplifying software complexity and physical design, while accelerating query execution.
more » « less
Full Text Available
Reinforcement Learning Strategies for Compiler Optimization in High level Synthesis

https://doi.org/10.1109/LLVM-HPC56686.2022.00007

Shahzad, Hafsah; Sanaullah, Ahmed; Arora, Sanjay; Munafo, Robert; Yao, Xiteng; Drepper, Ulrich; Herbordt, Martin (November 2022, The Eighth Workshop on the LLVM Compiler Infrastructure in HPC)

Full Text Available
Profile-driven memory bandwidth management for accelerators and CPUs in QoS-enabled platforms

https://doi.org/10.1007/s11241-022-09382-x

Sohal, Parul; Tabish, Rohan; Drepper, Ulrich; Mancuso, Renato (April 2022, Real-Time Systems)

The proliferation of multi-core, accelerator-enabled embedded systems has introduced new opportunities to consolidate real-time systems of increasing complexity. But the road to build confidence on the temporal behavior of co-running applications has presented formidable challenges. Most prominently, the main memory subsystem represents a performance bottleneck for both CPUs and accelerators. And industry-viable frameworks for full-system main memory management and performance analysis are past due. In this paper, we propose our Envelope-aWare Predictive model, or E-WarP for short. E-WarP is a methodology and technological framework to (1) analyze the memory demand of applications following a profile-driven approach; (2) make realistic predictions on the temporal behavior of workload deployed on CPUs and accelerators; and (3) perform saturation-aware system consolidation. This work aims at providing the technological foundations as well as the theoretical grassroots for truly workload-aware analysis of real-time systems. This work combines traditional CPU-centric bandwidth regulation techniques with state-of-the-art hardware support for memory traffic shaping via the ARM QoS extensions. We make three key observations. First, our profile-driven methodology achieves, on average, 6% over-prediction on the runtime of bandwidth-regulated applications. Second, we experimentally validate that the calculated bounds hold system-wide if the main memory subsystem operates below saturation. Third, we show that the E-WarP methodology is practical even when applications exhibit input-dependent memory access patterns. We provide a full implementation of our techniques on a commercial platform (NXP S32V234).
more » « less
Full Text Available

« Prev Next »

Search for: All records